AITopics

2511.04683

Genre: Research Report > New Finding (0.49)

Industry:

Education (0.48)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

arXiv.org Artificial IntelligenceJul-29-2025

Automating Thematic Review of Prevention of Future Deaths Reports: Replicating the ONS Child Suicide Study using Large Language Models

Osian, Sam, Dutta, Arpan, Bhandari, Sahil, Buchan, Iain E., Joyce, Dan W.

Prevention of Future Deaths (PFD) reports, issued by coroners in England and Wales, flag systemic hazards that may lead to further loss of life. Analysis of these reports has previously been constrained by the manual effort required to identify and code relevant cases. In 2025, the Office for National Statistics (ONS) published a national thematic review of child-suicide PFD reports ($\leq$ 18 years), identifying 37 cases from January 2015 to November 2023 - a process based entirely on manual curation and coding. We evaluated whether a fully automated, open source "text-to-table" language-model pipeline (PFD Toolkit) could reproduce the ONS's identification and thematic analysis of child-suicide PFD reports, and assessed gains in efficiency and reliability. All 4,249 PFD reports published from July 2013 to November 2023 were processed via PFD Toolkit's large language model pipelines. Automated screening identified cases where the coroner attributed death to suicide in individuals aged 18 or younger, and eligible reports were coded for recipient category and 23 concern sub-themes, replicating the ONS coding frame. PFD Toolkit identified 72 child-suicide PFD reports - almost twice the ONS count. Three blinded clinicians adjudicated a stratified sample of 144 reports to validate the child-suicide screening. Against the post-consensus clinical annotations, the LLM-based workflow showed substantial to almost-perfect agreement (Cohen's $κ$ = 0.82, 95% CI: 0.66-0.98, raw agreement = 91%). The end-to-end script runtime was 8m 16s, transforming a process that previously took months into one that can be completed in minutes. This demonstrates that automated LLM analysis can reliably and efficiently replicate manual thematic reviews of coronial data, enabling scalable, reproducible, and timely insights for public health and safety. The PFD Toolkit is openly available for future research.

artificial intelligence, large language model, natural language, (15 more...)

2507.20786

Country:

Europe > United Kingdom > Wales (0.25)
Europe > United Kingdom > England (0.25)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.43)
Government > Regional Government > Europe Government > United Kingdom Government (0.35)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceApr-30-2025

CoCo-Bench: A Comprehensive Code Benchmark For Multi-task Large Language Model Evaluation

Yin, Wenjing, Sun, Tianze, Yu, Yijiong, Fang, Jiawei, Su, Guangyao, Wang, Jiancheng, Wang, Zekun, Wang, Wei, Chen, Ran, Dai, Ziyun, Yuan, Shuai, Dong, Menghang, Luo, Peng, Cao, Dong, Lei, Da, Zhang, Yajun, Chen, Hao, Ma, Xiang, Liu, Yong, Liu, Weifeng, Xu, Yuanjian, Pei, Ji

Large language models (LLMs) play a crucial role in software engineering, excelling in tasks like code generation and maintenance. However, existing benchmarks are often narrow in scope, focusing on a specific task and lack a comprehensive evaluation framework that reflects real-world applications. To address these gaps, we introduce CoCo-Bench (Comprehensive Code Benchmark), designed to evaluate LLMs across four critical dimensions: code understanding, code generation, code modification, and code review. These dimensions capture essential developer needs, ensuring a more systematic and representative evaluation. CoCo-Bench includes multiple programming languages and varying task difficulties, with rigorous manual review to ensure data quality and accuracy. Empirical results show that CoCo-Bench aligns with existing benchmarks while uncovering significant variations in model performance, effectively highlighting strengths and weaknesses. By offering a holistic and objective evaluation, CoCo-Bench provides valuable insights to guide future research and technological advancements in code-oriented LLMs, establishing a reliable benchmark for the field.

benchmark, large language model, machine learning, (19 more...)

2504.20673

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Pungitore, Sarah, Yadav, Shashank, Subbian, Vignesh

PHEONA: An Evaluation Framework for Large Language Model-based Approaches to Computational Phenotyping

arXiv.org Artificial IntelligenceApr-7-2025

Computational phenotyping is essential for biomedical research but often requires significant time and resources, especially since traditional methods typically involve extensive manual data review. While machine learning and natural language processing advancements have helped, further improvements are needed. Few studies have explored using Large Language Models (LLMs) for these tasks despite known advantages of LLMs for text-based tasks. T o facilitate further research in this area, we developed an evaluation framework, Evaluation of PHEnotyping for Observational Health Data (PHEONA), that outlines context-specific considerations. W e applied and demonstrated PHEONA on concept classification, a specific task within a broader phenotyping process for Acute Respiratory Failure (ARF) respiratory support therapies. From the sample concepts tested, we achieved high classification accuracy, suggesting the potential for LLM-based methods to improve computational phenotyping processes.

classification, large language model, machine learning, (18 more...)

2503.19265

Country: North America > United States > Arizona (0.04)

Genre:

Overview (0.93)
Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

arXiv.org Artificial IntelligenceMar-6-2025

AI for Scaling Legal Reform: Mapping and Redacting Racial Covenants in Santa Clara County

Surani, Faiz, Suzgun, Mirac, Raman, Vyoma, Manning, Christopher D., Henderson, Peter, Ho, Daniel E.

Legal reform can be challenging in light of the volume, complexity, and interdependence of laws, codes, and records. One salient example of this challenge is the effort to restrict and remove racially restrictive covenants, clauses in property deeds that historically barred individuals of specific races from purchasing homes. Despite the Supreme Court holding such racial covenants unenforceable in 1948, they persist in property records across the United States. Many jurisdictions have moved to identify and strike these provisions, including California, which mandated in 2021 that all counties implement such a process. Yet the scale can be overwhelming, with Santa Clara County (SCC) alone having over 24 million property deed documents, making purely manual review infeasible. We present a novel approach to addressing this pressing issue, developed through a partnership with the SCC Clerk-Recorder's Office. First, we leverage an open large language model, finetuned to detect racial covenants with high precision and recall. We estimate that this system reduces manual efforts by 86,500 person hours and costs less than 2% of the cost for a comparable off-the-shelf closed model. Second, we illustrate the County's integration of this model into responsible operational practice, including legal review and the creation of a historical registry, and release our model to assist the hundreds of jurisdictions engaged in similar efforts. Finally, our results reveal distinct periods of utilization of racial covenants, sharp geographic clustering, and the disproportionate role of a small number of developers in maintaining housing discrimination. We estimate that by 1950, one in four properties across the County were subject to racial covenants.

covenant, racial covenant, santa clara county, (16 more...)

2503.03888

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.14)
North America > United States > Texas > Bexar County (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
(18 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Statutes (1.00)
Law > Civil Rights & Constitutional Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Vision (0.92)

Pedersen, Bjørn-Richard, Johansen, Rigmor Katrine, Holsbø, Einar, Sommerseth, Hilde, Bongo, Lars Ailo

More efficient manual review of automatically transcribed tabular data

arXiv.org Artificial IntelligenceJun-28-2023

Machine learning methods have proven useful in transcribing historical data. However, results from even highly accurate methods require manual verification and correction. Such manual review can be time-consuming and expensive, therefore the objective of this paper was to make it more efficient. Previously, we used machine learning to transcribe 2.3 million handwritten occupation codes from the Norwegian 1950 census with high accuracy (97%). We manually reviewed the 90,000 (3%) codes with the lowest model confidence. We allocated those 90,000 codes to human reviewers, who used our annotation tool to review the codes. To assess reviewer agreement, some codes were assigned to multiple reviewers. We then analyzed the review results to understand the relationship between accuracy improvements and effort. Additionally, we interviewed the reviewers to improve the workflow. The reviewers corrected 62.8% of the labels and agreed with the model label in 31.9% of cases. About 0.2% of the images could not be assigned a label, while for 5.1% the reviewers were uncertain, or they assigned an invalid label. 9,000 images were independently reviewed by multiple reviewers, resulting in an agreement of 86.43% and disagreement of 8.96%. We learned that our automatic transcription is biased towards the most frequent codes, with a higher degree of misclassification for the lowest frequency codes. Our interview findings show that the reviewers did internal quality control and found our custom tool well-suited. So, only one reviewer is needed, but they should report uncertainty.

artificial intelligence, machine learning, reviewer, (18 more...)

2306.16126

Country:

Europe > Norway (0.15)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

#artificialintelligenceMay-12-2022, 14:37:55 GMT

Protecting payments in an era of deepfakes and advanced AI

In the midst of unprecedented volumes of e-commerce since 2020, the number of digital payments made every day around the planet has exploded – hitting about $6.6 trillion in value last year, a 40 percent jump in two years. With all that money flowing through the world's payments rails, there's even more reason for cybercriminals to innovate ways to nab it. To help ensure payments security today requires advanced game theory skills to outthink and outmaneuver highly sophisticated criminal networks that are on track to steal up to $10.5 trillion in "booty" via cybersecurity damages, according to a recent Argus Research report. Payment processors around the globe are constantly playing against fraudsters and improving upon "their game" to protect customers' money. The target invariably moves, and scammers become ever more sophisticated.

deepfake, fraud, transaction, (14 more...)

Genre: Research Report (0.55)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Services > e-Commerce Services (0.55)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.83)
Information Technology > Artificial Intelligence > Vision (0.82)

#artificialintelligenceAug-25-2021, 12:45:30 GMT

3 Ways IQ Bot Enables Financial Process Automation

Most researchers agree approximately 80% of any organization's data is hidden in multiple sources, such as emails, PDF application forms, and paper documents. This data often goes unused because of the time and resources required to get meaningful information from it. Even then, the data often needs to be manually rekeyed into multiple locations. Combined with Robotic Process Automation (RPA), IQ Bot enables banking, financial services, and insurance (BFSI) companies to take advantage of intelligent document processing (IDP) to extract valuable data and streamline operations. IQ Bot blends multiple artificial intelligence (AI) technologies, such as computer vision, machine learning, and natural language processing (NLP), to glean relevant information from any type of document, add structure to the data, and deliver the results to multiple applications.

application, bot enable financial process automation, information, (11 more...)

Industry: Banking & Finance > Loans (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.92)
Information Technology > Artificial Intelligence > Robots (0.56)

#artificialintelligenceJun-10-2021, 01:15:04 GMT

The future of AI in finance is here: Reducing the cost of accuracy

Artificial intelligence and machine learning (AI/ML) have already transformed industries and changed the way work gets done across the enterprise. While finance has traditionally lagged behind other departments in the AI adoption curve, that's starting to change. Adoption of AI in finance is being spurred by digital natives (professionals who grew up in a connected world), with tech solutions finally delivering on the promise of AI/ML. Finance professionals accustomed to modern technology experiences in other areas of their lives are no longer willing to endure painstaking manual reviews and the threat of inaccurate data in their forecasts and plans. Outside of finance, many other areas of the businesses are far beyond cutting their teeth when it comes to using AI to improve forecasting and drive decision-making.

accuracy, finance team, manual review, (15 more...)

Industry: Leisure & Entertainment (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Applied AI (0.92)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.50)

#artificialintelligenceFeb-27-2021, 15:15:21 GMT

eCommerce, Delivery And The Gig Economy Create Opportunities For Both Fraud And The Artificial Intelligence To Detect It

The first area most people think of with fraud is finance. That extends past scammers and includes a wide range of attacks including banking and trades. There has been much discussion on how artificial intelligence (AI) is being used to address wider areas of fraud, such as in pharmaceutical prescription fraud. Last year saw a phenomenal growth in the use of online marketplaces and delivery services. The growth of fraud in those areas also increased.

artificial intelligence, fraud, gig economy create opportunity, (9 more...)

Industry:

Information Technology > Services > e-Commerce Services (0.91)
Transportation > Freight & Logistics Services (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.52)
Information Technology > Security & Privacy > Spam Filtering (0.36)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.32)